Search CORE

47 research outputs found

dynamic application autotuning for self aware approximate computing

Author: Davide Gadioli
Publication venue
Publication date: 01/10/2019
Field of study

The energy consumption limits the application performance in a wide range of scenarios, ranging from embedded to High-Performance Computing. To improve computation efficiency, this Chapter focuses on a software-level methodology to enhance a target application with an adaptive layer that provides self-optimization capabilities. We evaluated the benefits of dynamic autotuning in three case studies: a probabilistic time-dependent routing application from a navigation system, a molecular docking application to perform virtual-screening, and a stereo-matching application to compute the depth of a three-dimensional scene. Experimental results show how it is possible to improve computation efficiency by adapting reactively and proactively

Crossref

Archivio istituzionale della ricerca - Politecnico di Milano

Open Access Repository

Legio: Fault Resiliency for Embarrassingly Parallel MPI Applications

Author: Gadioli Davide
Palermo Gianluca
Rocco Roberto
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

Due to the increasing size of HPC machines, the fault presence is becoming an eventuality that applications must face. Natively, MPI provides no support for the execution past the detection of a fault, and this is becoming more and more constraining. With the introduction of ULFM (User Level Fault Mitigation library), it has been provided with a possible way to overtake a fault during the application execution at the cost of code modifications. ULFM is intrusive in the application and requires also a deep understanding of its recovery procedures. In this paper we propose Legio, a framework that lowers the complexity of introducing resiliency in an embarrassingly parallel MPI application. By hiding ULFM behind the MPI calls, the library is capable to expose resiliency features to the application in a transparent manner thus removing any integration effort. Upon fault, the failed nodes are discarded and the execution continues only with the non-failed ones. A hierarchical implementation of the solution has been also proposed to reduce the overhead of the repair process when scaling towards a large number of nodes. We evaluated our solutions on the Marconi100 cluster at CINECA, showing that the overhead introduced by the library is negligible and it does not limit the scalability properties of MPI. Moreover, we also integrated the solution in real-world applications to further prove its robustness by injecting faults

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

An Efficient Monte Carlo-based Probabilistic Time-Dependent Routing Calculation Targeting a Server-Side Car Navigation System

Author: Bispo Joao
Cardoso Joao M. P.
Gadioli Davide
Golasowski Martin
Martinovic Jan
Palermo Gianluca
Pinto Pedro
Silvano Cristina
Slaninova Katerina
Vitali Emanuele
Publication venue
Publication date: 01/01/2019
Field of study

Incorporating speed probability distribution to the computation of the route planning in car navigation systems guarantees more accurate and precise responses. In this paper, we propose a novel approach for dynamically selecting the number of samples used for the Monte Carlo simulation to solve the Probabilistic Time-Dependent Routing (PTDR) problem, thus improving the computation efficiency. The proposed method is used to determine in a proactive manner the number of simulations to be done to extract the travel-time estimation for each specific request while respecting an error threshold as output quality level. The methodology requires a reduced effort on the application development side. We adopted an aspect-oriented programming language (LARA) together with a flexible dynamic autotuning library (mARGOt) respectively to instrument the code and to take tuning decisions on the number of samples improving the execution efficiency. Experimental results demonstrate that the proposed adaptive approach saves a large fraction of simulations (between 36% and 81%) with respect to a static approach while considering different traffic situations, paths and error requirements. Given the negligible runtime overhead of the proposed approach, it results in an execution-time speedup between 1.5x and 5.1x. This speedup is reflected at infrastructure-level in terms of a reduction of around 36% of the computing resources needed to support the whole navigation pipeline

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

Out of kernel tuning and optimizations for portable large-scale docking experiments on GPUs

Author: Accordi Gianmarco
Beccari Andrea
Cosenza Biagio
Crisci Luigi
Gadioli Davide
Palermo Gianluca
Vitali Emanuele
Publication venue
Publication date: 01/01/2024
Field of study

Virtual screening is an early stage in the drug discovery process that selects the most promising candidates. In the urgent computing scenario, finding a solution in the shortest time frame is critical. Any improvement in the performance of a virtual screening application translates into an increase in the number of candidates evaluated, thereby raising the probability of finding a drug. In this paper, we show how we can improve application throughput using Out-of-kernel optimizations. They use input features, kernel requirements, and architectural features to rearrange the kernel inputs, executing them out of order, to improve the computation efficiency. These optimizations’ implementations are designed on an extreme-scale virtual screening application, named LiGen, that can hinge on CUDA and SYCL kernels to carry out the computation on modern supercomputer nodes. Even if they are tailored to a single application, they might also be of interest for applications that share a similar design pattern. The experimental results show how these optimizations can increase kernel performance by 2 X, respectively, up to 2.2X in CUDA and up to 1.9X, in SYCL. Moreover, the reported speedup can be achieved with the best-proposed parameterization, as shown by the data we collected and reported in this manuscript

Archivio istituzionale della ricerca - Politecnico di Milano

GPU-optimized approaches to molecular docking-based virtual screening in drug discovery: A comparative analysis

Author: Accordi Gianmarco
Beccari Andrea R.
Bisson Mauro
Fatica Massimiliano
Ficarelli Federico
Gadioli Davide
Palermo Gianluca
Vitali Emanuele
Publication venue
Publication date: 01/01/2024
Field of study

Finding a novel drug is a very long and complex procedure. Using computer simulations, it is possible to accelerate the preliminary phases by performing a virtual screening that filters a large set of drug candidates to a manageable number. This paper presents the implementations and comparative analysis of two GPU-optimized implementations of a virtual screening algorithm targeting novel GPU architectures. This work focuses on the analysis of parallel computation patterns and their mapping onto the target architecture. The first method adopts a traditional approach that spreads the computation for a single molecule across the entire GPU. The second uses a novel batched approach that exploits the parallel architecture of the GPU to evaluate more molecules in parallel. Experimental results showed a different behavior depending on the size of the database to be screened, either reaching a performance plateau sooner or having a more extended initial transient period to achieve a higher throughput (up to 5x), which is more suitable for extreme-scale virtual screening campaigns

Archivio istituzionale della ricerca - Politecnico di Milano

Evaluating Orthogonality between Application Auto-Tuning and Run-Time Resource Management for Adaptive OpenCL Applications

Author: Cristina Silvano
Davide Gadioli
Edoardo Paone
Gianluca Palermo
Vittorio Zaccaria
Publication venue
Publication date: 01/05/2020
Field of study

Abstract-The ever increasing number of processing units integrated on the same many-core chip delivers computational power that can exceed the performance requirements of a single application. The number of chips (and related power consumption) can thus be reduced to serve multiple applications -a practice which is called resource consolidation. However, this solution requires techniques to partition and assign resources among the applications and to manage unpredictable dynamic workloads. To provide the performance requirements in such scenarios, we exploit application auto-tuning, based on design-time analysis, of both application-specific dynamic knobs and computational parallelism. Such features are implemented in a software library, which is used to demonstrate the main contribution of this paper: a light-weight Run-Time Resource Management -RTRM -technique to improve resource sharing for computationally intensive OpenCL applications. We evaluate how much the interaction between RTRM and application auto-tuning can become synergistic yet orthogonal. In the proposed approach, run-time adaptation decisions are taken by each application, autonomously. This has two main advantages: i) a non-invasive application design, in terms of source code, and ii) a very low run-time overhead, since it does not require any central coordination of a supervisor nor communication between the applications. We carried out an experimental campaign by using a video processing application -an OpenCL stereo-matching implementation -and stressing out resource usage. We proved that, while RTRM is necessary to provide lower variance of the application performance, the application auto-tuning layer is fundamental to trade it off with respect to the computation accuracy

CiteSeerX

An extreme-scale virtual screening platform for drug discovery

Author: Andrea. R. Beccari
Candida Manelfi
Carmine Talarico
Chiara Latini
Cristina Silvano
Davide Gadioli
Emanuele Vitali
Federico Ficarelli
Gianluca Palermo
Publication venue
Publication date: 01/01/2022
Field of study

Virtual screening is one of the early stages that aims to select a set of promising ligands from a vast chemical library. Molecular Docking is a crucial task in the process of drug discovery and it consists of the estimation of the position of a molecule inside the docking site. In the contest of urgent computing, we designed from scratch the EXSCALATE molecular docking platform to benefit from heterogeneous computation nodes and to avoid scaling issues

Archivio istituzionale della ricerca - Politecnico di Milano

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY